Cooke County
Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions
Hu, Yuanzhe, Wang, Yu, McAuley, Julian
Recent benchmarks for Large Language Model (LLM) agents primarily focus on evaluating reasoning, planning, and execution capabilities, while another critical component-memory, encompassing how agents memorize, update, and retrieve long-term information-is under-evaluated due to the lack of benchmarks. We term agents with memory mechanisms as memory agents. In this paper, based on classic theories from memory science and cognitive science, we identify four core competencies essential for memory agents: accurate retrieval, test-time learning, long-range understanding, and selective forgetting. Existing benchmarks either rely on limited context lengths or are tailored for static, long-context settings like book-based QA, which do not reflect the interactive, multi-turn nature of memory agents that incrementally accumulate information. Moreover, no existing benchmarks cover all four competencies. We introduce MemoryAgentBench, a new benchmark specifically designed for memory agents. Our benchmark transforms existing long-context datasets and incorporates newly constructed datasets into a multi-turn format, effectively simulating the incremental information processing characteristic of memory agents. By carefully selecting and curating datasets, our benchmark provides comprehensive coverage of the four core memory competencies outlined above, thereby offering a systematic and challenging testbed for assessing memory quality. We evaluate a diverse set of memory agents, ranging from simple context-based and retrieval-augmented generation (RAG) systems to advanced agents with external memory modules and tool integration. Empirical results reveal that current methods fall short of mastering all four competencies, underscoring the need for further research into comprehensive memory mechanisms for LLM agents.
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > Texas > McMullen County (0.04)
- North America > United States > Texas > Cooke County (0.04)
- Asia > China > Hong Kong (0.04)
Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization
Kristiadi, Agustinus, Immer, Alexander, Eschenhagen, Runa, Fortuin, Vincent
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks. It is theoretically compelling since it can be seen as a Gaussian process posterior with the mean function given by the neural network's maximum-a-posteriori predictive function and the covariance function induced by the empirical neural tangent kernel. However, while its efficacy has been studied in large-scale tasks like image classification, it has not been studied in sequential decision-making problems like Bayesian optimization where Gaussian processes -- with simple mean functions and kernels such as the radial basis function -- are the de-facto surrogate models. In this work, we study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility. However, we also present some pitfalls that might arise and a potential problem with the LLA when the search space is unbounded.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
'Nothing to do, nowhere to go': What happens when elephants live alone
On a raw December day, as Christmas music blares over loudspeakers, an African elephant named Asha walks in tight circles in an enclosure at Natural Bridge Zoo, a roadside attraction in Virginia. Her living quarters consist of a barn and three outdoor yards--a fenced patch of grass about 90 by 40 feet, a dirt patch with a few logs scattered about, and a yard where she gives rides to children for $15 and her massive feet have worn a ring into the grass. Her space is barren--no shrubs, trees, or watering holes. Elephants, like humans, are social animals. In the wild, females typically live in herds of eight or more, yet Asha, who's nearly 40 years old, has been confined mostly alone for more than 30 years.
- North America > United States > Virginia (0.25)
- North America > United States > Vermont (0.05)
- North America > United States > Texas > Cooke County > Gainesville (0.04)
- (8 more...)
- Law (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Law Enforcement & Public Safety (0.96)
- (4 more...)
An Inconsistency-Tolerant Approach to Information Merging Based on Proposition Relaxation
Schockaert, Steven (Ghent University) | Prade, Henri (Université Paul Sabatier)
Inconsistencies between different information sources may arise because of statements that are inaccurate, albeit not completely false. In such scenarios, the most natural way to restore consistency is often to interpret assertions in a more flexible way, i.e. to enlarge (or relax) their meaning. As this process inherently requires extra-logical information about the meaning of atoms, extensions of classical merging operators are needed. In this paper, we introduce syntactic merging operators, based on possibilistic logic, which employ background knowledge about the similarity of atomic propositions to appropriately relax propositional statements.
- North America > United States > Florida > Alachua County > Gainesville (0.05)
- North America > United States > Georgia > Hall County > Gainesville (0.05)
- North America > United States > New York (0.04)
- (4 more...)